#linguistic corpus
Explore tagged Tumblr posts
stephen9260 · 9 months ago
Text
Personally I believe the world would benefit greatly from the existence of corpora from fanfiction.
Imagine a corpus that draws its data from every single ao3 fanfiction.
The number of occurrences for the most painfully specific words would be off the charts.
The collocations would be insane.
I am haunted by the possibility of a corpus in which the word "oh" occurs a million times and half of them are followed by another "oh".
Delightful
3 notes · View notes
lingthusiasm · 4 months ago
Text
Bonus 90: Don't you love to do a "do" episode?
We do love the word "do", and we hope you do too. Don't you want to know what the word "do" can do?  
In this bonus episode, Gretchen and Lauren get enthusiastic about the word "do"! We talk about the various functions of "do" as illustrated by lyrics from ABBA and other pop songs, what makes the word "do" so unique in English compared to other languages, and the drama of how "do" caught on and then almost got driven out again. Listen to this episode about do, and get access to many more bonus episodes by supporting Lingthusiasm on Patreon.
68 notes · View notes
linguisticdiscovery · 1 year ago
Text
What is corpus linguistics?
Linguists need large databases of language in use in order to study how language works.
A language database is called a corpus (Latin ‘body’, pl. corpora), and a corpus consists of a collection of texts (they don’t have to be actual text; a “text” is any cohesive discourse event).
The field of corpus linguistics focuses on how to build and use corpora and run statistical analyses on the resulting data.
Here’s a very accessible introduction to corpus linguistics:
Tumblr media
58 notes · View notes
superlinguo · 1 year ago
Text
hapax legomenon and automated email replies
While I’ve been on leave in 2023 I’ve had an automated email reply set up to direct people who email me to the most relevant alternative contact. Because I know that some people are stuck emailing me (sorry bosses, sorry mailing lists), I wanted to add a reminder about the magic of email filters, and couldn’t resist using it to share a little fact about corpus linguistics:
Sick of this automated reply?

If you’d like to not get automated replies from me, you can filter them by creating a rule. The best rule will probably be to filter anything that has the phrase "a hapax legomenon is a word or an expression that occurs only once within a corpus of texts" in the body of the email. That’s rare enough that it currently doesn’t turn up anywhere on the internet when I search it as a string with DuckDuckGo.
Of course, by time I’m back from leave this post will be up and my autoreply won’t technically be correct anymore!
29 notes · View notes
polyphanes · 9 months ago
Text
On Poimandrēs
The classical Hermetic literature (the Corpus Hermeticum, the Asclepius, the Stobaean Hermetic Fragments, etc.) generally take one of several formats: a monologue-type musing or speech (e.g. CH III or CH VII), a letter from a teacher to a student or between students sharing their wisdom (e.g. CH XIV or CH XVI), or a dialogue between a teacher and student (e.g. CH I or CH IV).  By far the most…
View On WordPress
11 notes · View notes
raffaellopalandri · 2 years ago
Text
Book of the Day - A Practical Handbook of Corpus Linguistics
Today’s Book of the Day is A Practical Handbook of Corpus Linguistics, edited by Magali Paquot and Stefan Th. Gries in 2021 and published by Springer. Magali Paquot is a permanent FNRS research associate at the Centre for English Corpus Linguistics, UCLouvain. She is co‐editor in chief of the International Journal of Learner Corpus Research, a founding member of the Learner Corpus Research…
Tumblr media
View On WordPress
69 notes · View notes
cornerihaunt · 3 months ago
Text
i'm thinking of doing a corpus study with taylor's discography and this popped up in the keywords results and to be fair it reads like poetry
Tumblr media
3 notes · View notes
shimyereh · 2 years ago
Text
Got accepted to my first academic conference!
46 notes · View notes
atthebell · 1 year ago
Text
i will say yesterday i caught part of bad's stream where he was explaining how ai and llms work and it was a pretty good explanation aside from the dirt block placing
3 notes · View notes
gredi-bird · 1 year ago
Text
so we've got terms like "actually autistic" to distance from vocal but unhelpful crowds like autism speaks, and i think that's great.
but i feel like i need one that's like "actually AI" so i can tag and follow stuff about the actual science and helpful application of machine learning stuff without the Build-Your-Own Waifu crowd invading my dash
4 notes · View notes
lingthusiasm · 11 months ago
Text
Kat: Yeah. Computers are super, super good at counting. They’re super, super good at finding and identifying these strings. But they’re not very good at the analysis bit. We don’t want our computer to do the analysis for us. We want to be very aware of the kind of software and the kind of programming that goes into it that give us the results. Because we as humans are fantastically sensitive to language. That’s where the human element comes in. It’s why we don’t just leave it all to the computers to just do as they will with it.
Gretchen: It’s really a lot more of a partnership between the computer showing you some things and the human making meaning out of that.
Kat: Exactly. It’s meant to be a partnership where you play to each other’s strengths. You let the computer do the bit it’s good at, and then you do the bit you’re good at. Excerpt from Lingthusiasm episode: Corpus linguistics and consent - Interview with Kat Gupta
Listen to the episode, read the full transcript, or check out more links about language and technology, and the history of language
173 notes · View notes
linguisticdiscovery · 1 year ago
Text
The many senses of run
How do you define the word run? You probably think of something like ‘fast pedestrian motion’, but what about the use of run in these examples? There are three boats that run from the mainland to the Island On my way to the elevator, I ran into Pete the bench, which numerous times rebuked the Attorney General for letting his witnesses run on The tears ran down my face Colors on the towels…
Tumblr media
View On WordPress
13 notes · View notes
linguisticalities · 1 year ago
Link
2 notes · View notes
polyphanes · 4 months ago
Text
Reading the Hermetica: CH XVI
For this week’s Reading the Hermetica discussion, we’re continuing our reading and discussion of the Corpus Hermeticum (CH), specifically Book 16 (CH XVI).  This text is entitled “Definitions of Asklēpios to King Ammōn (on God, matter, vice, fate, the Sun, intellectual essence, divine essence, mankind, the arrangement of the plenitude, the seven stars, and mankind according to the image)”.  As…
2 notes · View notes
nihongoseito · 2 years ago
Text
something interesting i learned from a second-language acquisition lecture today, apparently if a lexical (vocab) item appears less than 20 times per million words it’s not worth teaching lol
5 notes · View notes
newdayslinguine · 4 days ago
Text
Just casually watching an ex mormon video and she pulls out the CORPUS LINGUISTICS?? I see you girl
1 note · View note